Named Entity Recognition in Hindi using Maximum Entropy and Transliteration

نویسندگان

  • Sujan Kumar Saha
  • Parthasarathi Ghosh
  • Sudeshna Sarkar
  • Pabitra Mitra
چکیده

(NER) system becomes challenging if proper resources are not available. Gazetteer lists are often used for the development of NER systems. In many resource-poor languages gazetteer lists of proper size are not available, but sometimes relevant lists are available in English. Proper transliteration makes the English lists useful in the NER tasks for such languages. In this paper, we have described a Maximum Entropy based NER system for Hindi. We have explored different features applicable for the Hindi NER task. We have incorporated some gazetteer lists in the system to increase the performance of the system. These lists are collected from the web and are in English. To make these English lists useful in the Hindi NER task, we have proposed a two-phase transliteration methodology. A considerable amount of performance improvement is observed after using the transliteration based gazetteer lists in the system. The proposed transliteration based gazetteer preparation methodology is also applicable for other languages. Apart from Hindi, we have applied the transliteration approach in Bengali NER task and also achieved performance improvement.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Named Entity Translation with Web Mining and Transliteration

This paper presents a novel approach to improve the named entity translation by combining a transliteration approach with web mining, using web information as a source to complement transliteration, and using transliteration information to guide and enhance web mining. A Maximum Entropy model is employed to rank translation candidates by combining pronunciation similarity and bilingual contextu...

متن کامل

A Hybrid Feature Set based Maximum Entropy Hindi Named Entity Recognition

We describe our effort in developing a Named Entity Recognition (NER) system for Hindi using Maximum Entropy (MaxEnt) approach. We developed a NER annotated corpora for the purpose. We have tried to identify the most relevant features for Hindi NER task to enable us to develop an efficient NER from the limited corpora developed. Apart from the orthographic and collocation features, we have expe...

متن کامل

Maximum Entropy Approach for Named Entity Recognition in Bengali and Hindi

This paper reports about the development of a Named Entity Recognition (NER) system in two leading Indian languages, namely Bengali and Hindi using the Maximum Entropy (ME) framework. We have used the annotated corpora, obtained from the IJCNLP-08 NER Shared Task on South and South East Asian Languages (NERSSEAL) and tagged with a fine-grained Named Entity (NE) tagset of twelve tags. An appropr...

متن کامل

A Hybrid Approach of English- Hindi Named-entity Transliteration

In recent years, machine transliteration has gained a center of attention for research. Both machine translation and transliteration are important for e-governance and web based online multilingual applications. As machine translation translate source language to target language which results in wrong translation for named entities. Named entities are required to be translated with preserving t...

متن کامل

Optimizing Transliteration for Hindi/Marathi to English Using only Two Weights

Machine transliteration has received significant research attention in last two decades. It is observed that Hindi to English and Marathi to English named entity machine transliteration is comparably less studied. Currently, research work in this domain is carried out by using grapheme based statistical approaches. But, to achieve better accuracy for the transliteration, an adequate bilingual t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Polibits

دوره 38  شماره 

صفحات  -

تاریخ انتشار 2008